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REMARKS/ARGUMENTS 
Claims 1-42 were pending. In the present response, Applicants have responded to 

the Examiner's rejections, leaving Claims 1-42 pending in the present application for the 

Examiner's consideration. No new matter has been added. 

In summary of the Office Action of December 31, 2003, the Examiner has: 
L Objected to the specification under 35 U.S.C. §112, first paragraph; 

II. Rejected Claims 1-7 and 40-42 under 35 U.S.C. § 102, as being anticipated 
by Golding et al, U.S. Pat. No. 6,640,218 ("Golding"); and 

III. Rejected Claims 8-39 under 35 U.S.C. §103(a) as being unpatentable over 
Golding in view of Leshem et al., U.S. Pat. No. 6,470,383 ("Leshem") or Martin, U.S. Pat. No. 
6,338,066 ("Martin"). 

The Applicants respectfully traverse the Examiner's objections and rejections. 

I. Objection to the specification under 35 U.S.C. §112. first paragraph. 

In response to the Examiner's objection, the Applicants have submitted a 
substitute specification with this response. Applicants have submitted both a clean copy of the 
substitute specification and a marked-up copy for the Examiner's review. Applicants respectfully 
submit that no new matter has been added. Furthermore, Applicants have amended the title of 
the invention to address the Examiner's objection. In light of this submission. Applicants 
respectfully request that this objection be withdrawn. 

II. Rejection of Claims 1-7, 40 and 42 under 35 U.S.C. §102 

The Examiner has rejected Claims 1-7 and 40-42 under 35 U.S.C. §102 as being 
anticipated by Golding. Applicants submit that Golding does not disclose all of the elements of 
the claims. For example. Claim 1 recites, in part: 
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means for associating events with subjects, wherein counts are maintained for 
each subject and subjects are associated with categories; (Emphasis Added) 

Applicants respectfully submit that Golding associates counts with individual items, rather than 
subjects aggregating multiple items, and therefore Golding does not disclose this element of 
Claim 1. 

Golding discloses a system and method for "estimating the usefulness of an item 
in a collection of information." (Col. 2, lines 25-26). In Golding, "the measure of quality of the 
item is determined based upon the actual popularity of the item and the predicted popularity of 
the item." (Col. 2, lines 40-43). Golding creates a click log "identifying every instance in which 
a user 'clicked on' (i.e. selected" an item." (Col. 6, lines 43-45). In Golding, both the actual and 
predicted popularity of an item are based upon measurements of the number of times users 
select, or "click," the item. For example, Golding measures an item's actual popularity using an 
Actual Pooled Popularity value that "provides a measure of the number of overall number of 
times that the item has been selected by users," (Col. 9, lines 29-32). Additionally, Golding 
measures an item's predicted popularity using a predicted selection rate for the item that includes 
a selection rate predictor (SRP) that "serves to estimate the expected selection rate." (Col. 8, 
lines 43-54). Thus, Golding counts the number of times an item is accessed or is predicted to be 
accessed and associates those counts with the respective item. 

In contrast. Claim 1 states that "counts are maintained for each subject," not for 
each item. As noted in the specification, a "'subject' generically refers to one or more of a topic, 
a term, or a category." (Specification, p. 9). The claim term "subject" is not limited to a single 
page or type of event. For example, "if the search server serves pages from a potentially large 
number of pages, tracking hits for each page might result in statistics that are too fragmentary to 
be useful. Because of this, it is often useful to aggregate hits by subject." (Specification, p. 1 1). 

In summary, Golding associates a count with individual items, and not with a 
subject. Moreover, Golding does not disclose organizing multiple items into subjects and does 
not disclose creating any sort of aggregate measurement of the popularity of multiple items. 
Therefore, Applicants respectfully submit that Claim 1 is not anticipated by Golding and is 
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patentable. Claims 40 and 42 recite similar limitations and are therefore patentable for similar 
reasons. Claims 2-7 are patentable at least by virtue of their dependence on patentable 
independent Claim 1 . Claim 41 is patentable at least by virtue of its dependence on patentable 
independent Claim 40. 

III. Rejection of Claims 8-39 under 35 U.S.C. $103. 

The Examiner has rejected Claims 8-39 under 35 U.S.C. §103 as unpatentable 
over Golding in view of Leshem or Martin. Applicants submit that there is no motivation for one 
of skill in the art to combine the teachings of these references, and even these references were 
combined, none of the cited references do not disclose or suggest all of the elements of the 
claims. For example. Claim 8 recites, in part: 

accumulating counts for events bv subject, wherein counts for canonical 
equivalents are accumulated together; (Emphasis Added). 

Applicants submit that none of the cited references disclose or suggest this element. 

As discussed above, Golding associates a count with individual items, and not with a 
subject. Moreover, Golding does not disclose organizing muhiple items into subjects, let alone 
accumulating a count for a subject. 

Leshem does not disclose or suggest the cited limitation of Claim 8. Leshem discloses "a 
visual Web site analysis program." (Abstract). Leshem creates "a graphical site map that shows 
the overall architecture (i.e. the structural arrangement of content objects and links) of the Web 
site." (Col. 2, lines 14-18). Leshem "displays usage data on a site map . . ., for example in the 
form of the number of 'hits' per link, the number of Web site exit events per node, or navigation 
paths taken be specific users." (Col. 3, lines 12-15). Thus, Leshem discloses visualizing web 
sites and their associated usage information using the links between individual pages. Leshem 
does not disclose or suggest organizing web pages by subjects, rather than links, and therefore 
does not suggest accumulating counts for event by subject, as recited by Claim 8. 

Similarly, Martin does not disclose or suggest the cited limitation of Claim 8. Martin 
discloses a method to "predict a given web surfer's behavior based upon past surfer behavior." 
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(Col. 2, lines 15-16). Martin does not disclose or suggest the use of subjects to track the 
aggregate usage of multiple pages. 



Therefore, Applicants respectfully submit that Claim 8 is patentable over the cited 



references because none of the references disclose or suggest "accumulating counts for events by 
subject ." as recited by Claim 8. Additionally, Claims 9-39 are patentable at least by virtue of 
their dependence on patentable independent Claim 8. 



In view of the foregoing, Applicants believe all claims now pending in this 



Application are in condition for allowance. The issuance of a formal Notice of Allowance at an 
early date is respectfully requested. 

The Applicants invite the Examiner to telephone the undersigned if he believes a 
telephone conference would expedite to prosecution of this application. 



TOWNSEND and TOWNSEND and CREW LLP 

Two Embarcadero Center, Eighth Floor 

San Francisco, Cahfomia 941 11-3834 

Tel: 415-576-0200 

Fax:415-576-0300 

PHA:jtc 



CONCLUSION 



Respectfully submitted, 
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TRAFFICWEB SITE ACTIVITY MONITORING SYSTEM WITH 
TR.\CKING BY CATEGORIES .\ND TERMS 

FIELD OF THE INVENTION 

^OOOlj The present invention relates to a method and apparatus to provide statistical 
5 measurements relating to te^ffi eWeb site activity served by a server or a set of servers 
where the traffi c acti vitv relates to particular topics, terms or categories. 

BACKGROUND OF THE INVENTION 
[0002] A server is a computing device that responds to requests from clients. A Web 
server is a server that iG connected to the global int e rn e twork of n e tworks network known 

10 as the "Internet" and-that responds to requests received from Web clients over the 

Internet. As used herein, the term "Web server" may also refer to a plurality of servers 
organized to handle a large number of requests for a Web server, i.e., a distributed Web 
server system. The term "Web site" is often used to refer to a collection of Web servers 
organized by a business entit y, individual or oth e r e ntitv organization for thei Kliverse 

1 5 purposes. The term derives, most likely, from the language used to access on e of thos ea 
Web s e rverG server . A user is said to "go to a Web site" when the user directs his or her 
computer f Web client} to make a request of one ergf the site's Web servers and to disolav 
the response to the user, even though the user and the Web client do not actually move 
physicall y ^o anywhere . The user perception is that there is a location , a "site" on the 

20 Web where this W e b site data exists, but it should be understood that the term "Web site" 
often refers to the Web server or servers that respond to requests from Web clients, even 
though "site" does not necessarily refer to the physical location of the Web servers. In 
fact, in many cases, the servers that serx e upg f a Web site might be phvsicallv distributed 
physically to avoid downtime when local power outages of power or network service 

25 failures occur. 

[00031 The term "Web site" typically refers to a collection of pages maintained by a 
common maintainer for presentation to visitors, whether the collection is maintain e d kept 
on one physical server at one physical location or is distributed over many locations 
and/or servers. The pages (or the data/program code needed to generate the pages 
30 dynamically) need not be created by the common maintainer of the collection of pages. 

In places herein, such a maintainer of the collection of pages is referred to as the Web site 
operator. A s an F or example, an online merchant might set up a Web server with a 
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collection of pages created by the merchant or obtained from affiliates, suppliers^ or 
partners of the merchant and then put hyperlinks in the pages s«eh§a that a visitor can 
browse around the "site" as expected by the merchant. As another example, an individual 
dedicated to dispensing information about opera or an uncommon medical condition 
5 might set up a Web server and populate it with pages about th e ir topic of d e dication the 
particular subject , including such things as references to pages outside their collection of 
pages, dynamically generated pages of comments made by visitors^ or e-mail sent to the 
operator of the Web server. 

White [00041 Although many Web sites are targeted to single topics, some Web site 
10 operators serve many different interests and have integrated many different "properties" 
into a large Web site, often distributed over many servers and locations to handle traffic 
from a large number of visitors. "Traffic." generally refers to overall network use at a 
given moment, or it can refer to specific transactions, records or users in a data network, 
as in a Packets Per Second (PPS") measurement of Internet use. As used herein "traffic" 
15 refers to use of a Web site or any of its paees over a given time. "Properties." as used 
herein, means categories of content provided by the Web site. For example, the Yahoo! 
Web site ( initial URL: www.yahoo.com) brings together many properties of interest 
under one umbrella, including such prop e rti e s as a financial property (for providing stock 
quotes and other financial information and data), a sports property (for providing sports 
20 scores and news), an auction property, a chat property, an instant messaging property and 
many others. Sueh Complex sites, where visitors come for possibly unrelated properties, 
are often referred to as "portal sites". 

WMe lOOOSl lAlthough the typical Web site includes one or more servers that receive 
requests and provid e s provide responses according to the HyperText Transport Protocol 

25 (HTTP), the description herein should not be understood as being limited to a particular 
protocol or a particular network. For example, the Web site might be connected to the 
Web clients viaby an intranet, wireless access protocol (WAP) network, local area 
network (LAN), wide area network (WAN), virtual private network (VPN) or other 
network arrangement, hi other words, a Web site for which traffic is being monitored can 

30 be monitored independent of the protocols or network used. "Web" typically refers to 
"World Wide Web" (or just "the WWW"), a name given to the collection of hyperlinked 
documents accessible over the hitemet using HTTP. As used herein, "Web" might refer 
to the World Wide Web, a subset of the World Wide Web, a local collection of 
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hyperlinked pages, or the like. More generally, a Web server is a server responsive to 
requests received from a Web client. 

100061 Typically, requests and responses are considered "pages". For example, with the 
HTTP protocol, a Web client requests a page from a Web server and the Web server 
5 responds to the request by sending a page. In the HTTP protocol, a Uniform Resource 
Locator ("URL") identifies a page and that URL is presented to the Web server as part of 
a request for a page. The pages are often HyperText Markup Language (HTML) pages or 
the like. The HTML pages can be static pages, dynamic pages or a combination. Static 
pages are pages that are stored on the server, or in storage accessible by the server, prior 
10 to the request and are sent from storagcto the client in response to a request for that page. 
Dynamic ("on the fly") p ages ar e pag e s that are generated, in whole or in part, upon 
receipt of a request. For example, where the page is a view of data from a database, a 
server might generate the page dynamically using rules or templates and data from the 
database where the particular data used depends on the particular request made. 

15 10007] The term "page hit" refers to an event wherein a server receives a request for a 
page and then serves up the page. Li even a moderate sized Web site, the servers might 
handle millions of page hits per day. A common measure of traffic at a Web site is in the 
niunber of page hits (often referred to as "page views", especially in an advertising . 
context) for particular pages or sets of pages. Page hit counts are a rough measure of the 

20 traffic of a Web site. More refined measures include unique visitor counts, where only 

one page hit is counted for each unique client p e r som e for a predetermined period. Such 
measures work well when the traffic of interest relates to particular pages, but are 
generally not informative when traffic by topic is desired and multiple pages may relate to 
one topic and one page may relate to multiple topics. 

25 100081 For example, where a stock information Web server j^s^serves up just a page 
for each stock and only one page relates to that stock, it would be a simple matter to 
determine levels of user interest in particular stocks by-^^ examining the server logs of 
the Web server to determine which stock pages are being served the most. Unfortunately, 
most real-world Web services are not so well defined. One more complex Web site 

30 includes servers that serve news, sports and financial content along with content on many 
different subjects and pages that relate to a common topic might be served from more 
than one of those content components. With the requests spread over different content 
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components, the level of user interest would not be accurately reflected in jws^a 
measurement of interest in lust one content component. For example, interest in a 
particular athletic shoe company might be expressed by traffic to pages containing news 
stories relating to the company, traffic to sports pages referring to the company, traffic 
5 relating to financial content about the company, searches for the company's products, 
purchase transactions for the company's products, etc. Also, some requests might be 
falsely associated with interest in the company if, for example, users use a search term 
that has more than one meaning, one of which relates to the name of the company. 

100091 Such a Web site might also include search capability, wherein a user submits a 
10 search request using their Web client and a Web server responds with a page that contains 
search results. It is a simple matter for a search engine (a Web site set up to respond to 
search requests) to log all of the search requests. Typically, a search request is in the 
form of a search phrase containing one or more search terms. Search requests can be 
counted by search term, e.g., count the number of times "Ford" or "sports" was used as a 
1 5 search word in a search phrase, but such counts have limited utility where one search term 
might relate to multiple topics and multiple search terms might relate to one topic. 

[00101 Where page hits, search requests, or other "events" such as purchases, are 
logged or loggable, some operators of Web sites track statistics other than just page hits 
or search requests. One well-known statistic that is ofl:en seen in Web systems, and 

20 elsewhere, is a "top-n" list, such as a "Top Ten" list. Such a list presents the n highest 
requested items. For example, a newspaper might list the 40 best -selling books for a 
given month, ranked by industry-wide sales. The list might indicate, for each book on the 
list, the book's ranking for the prior measurement period. As another example, a Web site 
operator might include a page served by feeifthe Web server(s) that lists the top sellers for 

25 that operator. 

[00111 As yet another example, a Web site operator might include a page served by 
theifthe Web site that shows the top -sellers for various categories. For example, if the 
Web site operator is a toy retailer, the operator might create pages to be served by theirits 
system wherein the pages list the top -selling toys for infants, the top -selling toys for 
30 mfeHte toddlers , the top -selling toys for teens, etc. In a variation on the basic count of 
items sold, some Web site operators might include statistics showing how various items 
are moving up or down in sales. For example, a list could be presented showing the top 
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40 sellers for the month along with their sales rank for the prior month, or a list ranking 
items in order of increase in sales or sales rank. 



[00121 As with the Web server that serves up specific pages for specific topics, such as 
one page per stock on a stock information Web site, sales statistics such as those 
5 described above are easy to generate. An electronic commerce server can simply log 
each purchase and then a program can scan the log for a period of time to determine sales 
levels for each item. The sales can also be easily_ categori2ed-easfly where the items are 
already categorized. For example, a book selling Web site can log all sales of books, 
where each book is already categorized (e.g., "fiction-," "reference-," "technical-,!! 

10 "self-help-,!! "other nonfiction-," etc.) and then aggregate the sales for category to identify 
sales by category or top sellers within a category. However, the -!!top-rt" or best -seller 
lists are limited in that the categorization of the items must be done manually or along 
lines that are set out ahead of time and worked into the data. Thus, such a system cannot 
be easily adapted to events that are not already well-categorized^; it does not combine 

15 information across multiple events and types of events, nor is the information 
normalizable so that detailed and relative statistics can be derived. 

[0013] Some traffic analysis modules have been used to analyze traffic over a Web site, 
but their fimctionahty is limited. One such module performs basic statistical analysis of 
Web server logs to determine Web site usage. They are typically not designed to 
20 compute interest in particular topics, although the statistics they offer indirectly reflect 
that interest. One problem with such modules is that they either rely on manual 
associations of events to topics or th e y do not associate events with topics, so the former 
approach is not scalable and the latter approach does not group events in a meaningfiil 
manner. 

25 [001 4] H eretofore, however, none of the statistics systems described above allows for 
the more sophisticated, and thus informative, measurements often needed to make overall 
strategy decisions with regard to trends, advertising purchases, popular culture review, 
product marketing and other decisions that need to be made in light of traffic statistics 
where the traffic relates to complex events and requests. 
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SUMMARY OF THE rNVENTION 
[001 5| Using the present invention, a traffic monitor generates statistics about traffic of 
one or more servers and is capable of associating monitored events with topics or terms 
5 and aggregating the statistics about the monitored events into categories. Monitored 

events might include page hits, search requests, purchases and/or other actions. One use 
of such statistics is to determine trends and changes in areas of user interest, in effect 
detecting "bxizz" (a flurry of activity'^ due to increased interest, where such interest is 
associated with a topic, term or category. 

10 [0016] In an altemate embodiment, instead of monitoring traffic resulting from requests 
from any set of users to a specified set of Web servers or Web sites (operated by one or 
more ewri^ yentities) , the traffic between a defined set of users to any set of Web servers 
could be monitored instead. 

Monitored events miglit include page hits, search requests, purchases and'^'or other actions. 

1 5 [0017] hi one embodiment of a traffic monitor, events are associated with topics or 
terms and are grouped by category. For example, wheg ewhen a user provides a search 
server with search terms and then selects a page from search results, the resulting page hit 
might be associated with one or more of the search terms used. WheFe When a user 
arrives at a particular page after navigating a subject directory, the page hit might be 

20 associated with the subject of the navigation. By comparing changes or trends in the 
traffic associated with a search term or a category, the "buzz" associated with a topic, 
term or category can be assessed. 

[00181 hi a process of evaluating traffic, the raw values can be normalized to reduce the 
effects unrelated to the buzz around a topic, term or category. For example, while raw 

25 values for traffic are likely to grow from midnight to midday in a given geographical area 
as users awake and begin accessing the server system, the traffic measurement can be 
normalized to remove time of day variations. Other variations, such as overall traffic 
variations, seasonal variations, weekly variations and general topic variations (when 
examining buzz for more specific topics), can also be normalized out. Ratios and 

30 difference measurements might also be performed in comparing two or more topics, 
terms or categories to determine relative buzz. 
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f 001 91 Once "buzz" (used herein to mean a statistical measure of interest) is determined 
for a set of topics, terms or categories, that information can be used in many ways. For 
example, users might be interested in seeing what are the current popular terms or 
categories, so that they can follow trends and be informed on those popular topics. 
5 Advertisers might also be interested in buzz, a ssince they might want to dynamically 
switch their advertising to follow topics having increasing buzz. 

|0020| One advantage of a traffic monitor having aspects of the present invention is that 
the traffic monitor will group events s uch so that a user of the statistical data can get 
statistics that cover events that relate to a topic without including counts for events that 
10 are not really on the same topic. Yet another advantage is that counts can be normalized 
for a topic or term against other topics or terms in a category. 

10021] A further understanding of the nature and the advantages of the inventions 
disclosed herein may be realized by reference to the remaining portions of the 
specification and the attached drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



10022] Fig. 1 is a block diagram of a Web site system including a statistical analyzer 
within which a traffic monitor according to one embodiment of the present invention 
20 might be used. 

100231 Fig. 2 is a graph of a category hierarchy, showing categories and subcategories, 
as well as terms associated with categories and subcategories. 

[0024] F ig. 3 is a schematic of a data structure used to represent counts by category and 
topic/term; Figs. 3(a) and 3(b) show data structures. 

25 [0025] F ig. 4 is a schematic of a data structure for storing multiple sets of traffic data, 
one set per period. 

10026] Fig. 5 is a schematic diagram of a canonicalization system. 

10027] F ig. 6 is a flowchart of a process for categorizing search words. 

100281 Fig. 7 is a block diagram of server system including a traffic monitor according 
30 to one embodiment of the present invention. 
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[00291 Fig. 8 is a flowchart of one process for generating buzz/trend reports. 
[0030| F ig. 9 is an illustration of a buzz report. 

[00311 F ig. 10 is an illustration of a list of vertical market topics for which buzz can be 
presented in the exemplary report of Fig. 9. 

5 [00321 Fig. 1 1 is an illustration of a report where the buzz for terms is plotted over time 
and relative to other terms in a category. 

[00331 F ig. 12(a) and 12(b) together illustrate a report showing buzz values for 
subcategories in a category. 

[00341 Fig. 1 3 illustrates a campaign monitoring page. 
10 [0035] F ig. 14 illustrates a campaign monitoring report. 
[0036[ F ig. 1 5 illustrates intersection analysis. 
[0037] F ig. 16 illustrates associated interests analysis. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 
1 5 [00381 The following description is organized approximately according to the following 
outline: 

1. Overview 

2. Collecting Traffic and Binning by Subject 

2. a. Categorization 
20 2.b. Canonicalization 

3. Examples of Sources of Data for Traffic Monitor and Uses for Collected Data 

4. Uses of the Statistical Analysis 
4.a. Buzz/Trend Reports 

4.b. Selling Advertising Space Based on Categorizations and/or Buzz 
25 4.C. Campaign Monitoring 
4.d. Intersection Analysis 
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4. e. Associated Interests Analysis 

5, Variations on the Basic System 

1. Overview 

5 [0039] In this description, the term "buzz" refers to a measurement of the traffic user 
activity that relates to a particular topic, term or category. As used herein, "subject" 
generically refers to one or more of a topic, a term, or a category. Thus, the topic "U.S. 
presidential politics", the search term "Ford" and the category "music", are all subjects for 
which "buzz" can be measured. 

10 100401 " Traffic" refers to a count, or approximate count, of the events (hi ts, searches, 

requests, purchases, etc.) that occurred for a given subject. Traffic can cither b e measured 
either for a defined set of servers accessed by a possibly xmconstrained set of clients/users 
("selected servers/all clients"), for a defined set of clients/users accessing a possibly 
unconstrained set of servers ("all servers/selected clients"), or for a defined set of clients 

15 accessing a defined set of servers ("selected servers/selected clients"). For example, the 
selected servers might be the servers that serve content for one or more defined Web sites, 
the servers that are monitored by an advertising network or ratings network, the servers 
monitored by a imiversity network monitoring system, etc. 

(00411 " Traffic" might be a raw count of the number of events, unnormalized or 
20 otherwise, but traffic might also be measured not with one coimt per event, but one count 
per xmique user (i.e., even if a particular user makes multiple requests, only one request is 
counted) or one count per unique user per time period might also be the measure of 
counting traffic. Traffic can be unnormalized, such as integer counts for the number of 
events, or can be normalized. One purpose for normalization is to place the number in a 
25 suitable value range for presentation or other processing. Another purpose for 

normalization is to normalize out variability in the counts that is likely to be variability 
independent of levels of user interest. 

[00421 In general, monitoring traffic for any users or any servers ("all servers/all 
clients") is eniy-practical onlv in a centrally managed system and cannot currently be 
30 effected for Internet clients and servers in general . However ^ however if logs of such 
activity were available, the traffic monitors and statistical analyzers described herein 
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might be used to measure traffic and buzz in a more general setting. The examples herein 
largely refer to the "selected servers/all users" variation, but one of ordinary skill in the 
art would understand how to apply this disclosure's teachings of that variation to the other 
variations. 

5 10043] Events can be page views, search requests, purchases, requests for media such as 
streaming audio or video, message board actions, chat room actions, club actions, instant 
messaging actions, online gaming actions, or any other action that is detectable by a 
server of a Web site. The expected use of the traffic monitor is to monitor large numbers 
of events, often measuring in the millions, to discem trends and buzz. To enhance the 
10 usefulness of the results, events should be logically grouped so that the groupings will by 
and large have statistical significance and topical relevance. The process of grouping 
events is referred to herein as "binning". 

1 00441 Whatever the extent of the traffic monitoring (e.g., selected servers/all users), 
the results can be sliced up by demographic information. For example, the traffic monitor 

1 5 can provide the overall counts for the category "music", but the traffic monitor can also 
divide up the overall counts by different demographic categories, using user-provided 
demographic data or demographic data provided in another way. For example, the traffic 
monitor can provide buzz values for the demographic of males 18-45 male svears old, 
with U.S. addresses. An example of demographic information other than user-provided 

20 information is the user's client's DP (Internet Protocol) address. Examples of user- 
provided information include age, gender, residence location, and user preferences, such 
as browser type, client type, network type, etc. In addition to slicing up the data to show 
traffic for a particular demographic, the demographic data can be used to show how a 
particular count for a topic is divided up among the demographic categories. 

25 

2. Collecting Traffic and Binning by Subject 

100451 Fig. 1 is a block diagram of a traffic monitor 100 including a canonicalizer 102, 
a categorizer 104, a count generator 106 and a canonicalization database 108. 
Canonicalizer 102 is coupled to receive search log records and page hit records to 
30 determineT the relevant tonic for a given search request or page hit , what th e r e levant 
topic is . Canonicalizer 102 might refer to canonicalization database 108 to resolve 
canonical terms. 
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[0046] In alternate embodiments, different sets of one or more server logs are used to 
identify the bin or bins for which counts are incremented for an event logged in the server 
logs. For example, the system shown in Fig. 1 might include an additional log of 
purchase records or streaming media downloads. Where the events to be binned are 
5 purchase events, each event can be evenly weighted or each event can be weighted 
according to a purchase amount. 

[0047] A s an example of a specific traffic collection operation, suppose that thousands 
of users connect to a search server and perform a search using the phrase "local weather". 
The search server might respond to that phrase by presenting the user with a results page 

1 0 including links to pages relating to weather and specifically local weather (where locality 
might be inferred fi-om user preferences or other methods). The search server logs the 
search itself and the "clicked-through" pages fi"om the results page. A page is a 
"clicked-through" page when a user notes a reference to that page on the results page and 
selects that reference fi'om the results page. In a standard HTTP system, the effect of 

1 5 those actions is that the user's browser (or other HTTP client) requests the referenced 
page from the server indicated in the reference (which mi.£;ht may or mi.^ht may not be a 
portal server) and the referenced server responds to the request with the referenced page. 

[00481 I f the search server serves pages from a potentially large number of pages, 
tracking hits for each page might result in statistics that are too gran u 1 a r fra ^men tarv to be 

20 useful. Because of this, it is often useful to aggregate hits by subject. For example, if^ 
one day, there are fifty requests for a local weather page in a day for fifty different 
localities, it might be more informative to state sav that there were fifty requests for 
weather information than to state make 50 statements that there was one request for 
weather in a given locality. Because of this, in the preferred embodiments, traffic 

25 monitor 222 aggregates coimts into bins, where each bin is for a particular topic or term. 

[0049] A given event can be binned with other events that relate to the same topic or 
term to achieve statistical significance and topical relevance to the counts for the topic or 
term . In oth e r words, th e bins contain e nough counts to b e s tatistically significant and 
e vents that r e ally relat e to th e sam e topic ar e binn e d tog e ther , even though the events may 
30 appear to be quite different. For example, page hits for a page known to relate to the U.S. 
presidential elections can be binned with page hits for other pages known to relate to the 
U.S. presidential elections. Where the page hits are the result of a given search term, the 
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page hits are binned with other results for the search term. In this manner, counts are 
accumulated for a bin associated with that topic or term. A given topic or term is 
associated with one or more categories . Of course. : althoueh a traffic monitor could be 
designed wherein a topic or term might be associated with none of the categories, but it is 
5 usually best to consider that any given topic or term falls into at least one category, even 
if the category is a catch all "ov e raH "miscellaneous events" category or the root of a 
category hierarchy. 

100501 In some implementations, categories are organized hierarchically, with a first 
level of categories, subcategories within categories and possibly subcategories within 
10 subcategories. In this hierarchical arrangement, an example of which is shown in Fig. 2, 
topics/terms are associated with categories and/or subcategories. Unless otherwise 
indicated where "category" is used herein, it should be interpreted to refer to a main 
category or a subcategory. 

[00511 In some cases, one topic/term is present in more than one category, as with (see 
15 Fig. 2) the term "New Orleans", which is found in the categories "Music", "Blues" (itself 
a subcategory of "Music"), and "Travel". Typically, where one term is present in two or 
more categories, the term has two meanings. If the meaning can be discerned firom the 
context, then only the count for the actual meaning of the term should b ei§ incremented. 
In the following section, a categorizer for identifying the particular bin or bins in which to 
20 count an event is described. For example, if the context of an event was travel to New 
Orleans, the count for the term "New Orleans" under the category "Travel" would be 
incremented, but the counts for the other "New Orleans" terms would not be. 

100521 Fig. 3 illustrates one possible arrangement of data structures for storing the 
counts for bins. As shown in Fig. 3(a), a category record 150 contains data elements 

25 relating to a label for the category and coimts for each of a plurality of demographics. 
Where demographics are not used, the category record would jtts^store iust a single 
count. Sffie eln general, since counts are for topics and terms, the category record need 
not contain the count(s) for the category. Instead, the count(s) for a category could be 
determined by summing the counts for all the terms that ar e in that category. However, in 

30 system with ^large numbers of events, storing the counts for categories may result in a 
much faster system than if the category counts had to be calculated each time they were 



used. 
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Also shown in f 00531 Fig. 3(a) is also shows a subcategory record 152. Subcategory 
record 152 is similar to category record 150 except that subcategory record 152 includes a 
pointer to the category for that subcategory. 

fOQ541 Fig. 3(b) illustrates a bin record 154 associated with a topic or term. Bin record 
5 154 includes a label for the topic or term and includes count data for one or more 

categories (or subcategories). For each represented category, bin record 154 holds coimt 
data for thatthe topic/term in that category as well as a pointer to the category. 

[0055] F ig. 4 illustrates a data structure 1 70 that might be used to store multiple sets of 
traffic data, one set per periodv In this example, the period is daily, so data structure 1 70 
10 stores a collection of category/subcategory records and bin records for each of a plurality 
of dates. 

2. a. Categorization 

[00561 , Categorizer 104 determines the bin or bins that have their count incremented for 
15 a particular event. For example, where the event is a search request using the search 
phrase "formula one" and the search results page lists pages related to algebra and auto 
racing, the search might be categorized vmder mathematics or sports. However, 
categorizer 1 04 correlates searches with search results selected, so that when the logs 
show that the user selected from the search results a page relating to auto racing, 
20 categorizer 104 allocates that event to the "auto racing" category and the "formula one" 

term in that category. Where terms remain ambiguous even after selection of a page (or if 
the user does not select a page from a search results page), categorizer 104 might output 
fractional counts for more than one category with suitable weights summing to one. 

[0057] In some cases, the category associated with a page hit or a search are readily 
25 determinable by the state of a visitor's server session. For example, if the user is 

navigating a search directory by category/subcategory using a search term and then 
selects an entry under a subcategory, then the count for that event is readily allocable to 
the bin for the search term under the category and/or subcategory previously assigned to 
that entry. For example, if a user navigates the Yahoo! search directory path "Top: 
30 Sports: Regional Sports: San Jose" using the search term "scores" and selects a page from 
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the result, then the categories and subcategories that get the count are readily 
ascertainable. 

[0058] However, with direct searches with words having multiple meanings, the 
category might not be so apparent. For example, if the user started a search within the 
5 Yahoo! search path "Top:" and requested a search on "Ford" and "Michigan", the 

category is unclear because the visitor might be interested in the Gerald R. Ford Library 
in Ann Arbor, Michigan, or the visitor might be interested in the Ford Motor Company, 
which has offices in Michigan. One method of resolving the ambiguity is to examine the 
resulting clickstream. For example, a Yahoo! search directory search using the search 
10 phrase "Ford Michigan" might return several matches, including those shown in Table 1. 

Table 1 

Regional > U.S. — StatOG > Michigan > Citioo > Ann Arbor > — Education > 
CollogG and Univoroity > — Public — > — Univcraity of Michigan > Librarico and 
MuoGumo 

15 Gerald R. Ford Libra e 

Regional — > U.S. — Stateo — > Michigan > Metropolitan Arcao — > Detroit Metro > 
BuoincoD and Shopping > — Shopping and So r^/iccD — > Automotive > 
Dcalcrg — > Makeo 

20 Ford 

Table 1 ^ r 

Regional > U.S. States > Michigan > Cities > Ann Arbor > Education > College and 
University > Public > University of Michigan > Libraries and Museums 

Gerald R. Ford Library - 

Regional > U.S. States > Michigan > Metropolitan Areas > Detroit Metro > Business and 
Shopping > Shopping and Services > Automotive > Dealers > Makes 

Ford 



[0059] When a user is presented with the entries shown in Table 1 and selects the first 
clickable link (Gerald R. Ford Library) , the categorizer would assign the coimt for the 
event to the "Libraries and Museums" subcategory (and to each higher level subcategory 
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if such tracking is performed). However, if the user selects the second clickable link, the 
categorizer assigns the second category/subcategory path shown in Table 1 . 

Whefe f00601 When the categories tracked by the statistics monitor overlap the category 
structure of the search directory, the task of assigning counts is complete. However, 
5 where the structure of the statistics monitor does not overlap the structure of the search 
directory, some additional steps migh t may be performed. For example, if the statistics 
monitor had categories for each U.S. state and categories for each U.S. President, then the 
count for the search term "Ford Michigan" followed by a click on the first clickable Unk 
in Table 1 might result in the statistics monitor assigning half a count to the category for 
10 Michigan and half a count to the category for former U.S. President Gerald R. Ford. 

[00611 In a more precise implementation of such a system, the counts might not be 
even. Continuing with the example of Table 1, more than half a count might be assigned 
to the more likely category of interest and the remainder to the other category. Thus, one 
might expect that a click on a link to the Gerald R. Ford Library is more likely to reflect 
15 an interest in the library as opposed to an interest in Michigan, where the library happens 
to be located. 

[0062] The search engine for the search directory retums a list of matches with one or 
more clickable link per match. Generally those links can be categorized into one of three 
categories: 1) intemal pages, 2) extemal pages categorized intemally and 3) external 

20 pages not categorized intemally. A Type 1 link is easily categorized by assigning a 
category to the page pointed to by the link. A Type 2 link does not have an explicitly 
assigned category, but can be categorized because the referenced page is referenced 
elsewhere on the portal site by a Type 1 link. The categorization for Type 1 links is 
easier than categorizing all possible search terms, and may have already been done if the 

25 search directory is organized by subject, as with the Yahoo! search directory. 

10063] Fig. 5 illustrates one process to categorize search words for Type 1 and Type 2 
links based on the link selected. Type 3 links can be binned as well, if some category 
indication is present or a categorization engine that handles such links is used to identify 
their categories. As shown, a categorizer would extract the search word events (user ID, 
30 timestamp, search words) firom search logs. The user ID can be implemented as a unique 
cookie stored in the user's Web browser that is sent to the search engine and Web page 
server with each request and is stored in the logs. 
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[00641 The categorizer also extracts from page view logs the user ID, timestamp, page 
ID, etc. for each page view. After sorting both of the extractions by time, the categorizer 
can interleave the extractions and determine which page is viewed after a user views 
results of his or her search. From that determination, the categorizer can look up the 
5 category of the viewed page and that category can be attributed to the search. Where the 
search is being tracked for buzz evaluation or other counting evaluation, the category 
count is incremented. Where a category cannot be determined, the event can be ignored 
for monitoring purposes. 

100651 In previously developed categorizers, the search terms are used to identify the 
10 category that gets credit for the hit, but using the above method, the category is identified 
from the page that is visited after the search, eliminating the need for complex semantic 
analysis to resolve ambiguities or manual categorization of search words, which is not 
scalable to a large system. 

[00661 As an altemative to the method described in Fig. 5, the links on the search result 
15 page can be rewritten to include "redirects" (i.e., intermediate commands executed upon a 
click) that log the page ID and search phrase, so that only one log is needed. With one 
log, the sorters and interleaver are not needed. 

100671 Either way, the categorizer finds the meaning of a search term that the user 
ascribes to the term, in an inherently scalable way. 

20 

2.C. Canonicalization 

[0068] When dealing with search words, it often makes sense to combine information 
about similar terms that are intended to produce the same results. For example, a term 
may be misspelled, or it may have words in a different order than another, or it may 
25 contain non ess e ntial nonessential words such as "the". The process of reducing such 
terms to a common, standard form is known as canonicalization. Many processes are 
known for performing canonicalization, ranging from less aggressive processes such as 
removing certain punctuation characters or so-called "stop words" such as "of* and "the", 
to more aggressive processes such as adding, changing or deleting letters within words. 

30 [00691 The canonicalization process might be performed by canonicalizer 102 that is 

part of traffic monitor 100 (see Fig. 1). As an example, canonicalizer 102 might canonize 
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the search phrase "Denver whether" to "weather" by inferring that a speUing error 
occurred. As with categorizer 104, canonicalizer 102 uses user behavior to improve the 
canonicalization process. Using user behavior is inherently scalable because there are 
generally proportionately more users to give human input as the system grows larger to 
5 handle more traffic. 

[00701 Using user behavior (a large increase in number of searches) also allows more 
aggressive canonicalization. For words whose search usage has rapidly increased rapidly , 
more aggressive canonicalization techniques can be used. In addition, when combining 
information (such as number of searches) about such aggressively canonicalized terms, 

10 the system does not just add the values, but transfers the portion of the value that exceeds 
a prior baseline value to the canonicalized term, leaving the remainder attached to the 
raw, uncanonicalized term. For example, if "Concord" (Massachusetts) has a current 
value of 420 and is to be combined with "Concorde" (the airplane) with a current value of 
825, and "Concord" had a prior baseline value of 130, we transfer a value of 290 (420 - 

15 130) to the canonicalized term, ending with "Concord" at 130 and "Concorde" at 1115. 

100711 The baseline value can be defined as the average of the value for a previous 
period. In one embodiment, the baseline value is retained. If the value for the term being 
combined declines to its previous baseline, the terms are no longer merged. Combining 
only the values over baseline more accurately reflects reality for terms with multiple 
20 meanings. 

[00721 Fig. 6 illustrates a typical implementation of a canonicalization process. The 
aggressive canonicalization step might include adding, changing or removing letters from 
search terms. If the value of the term being merged is within some margin, such as 20%, 
of its baseline, the term is no longer merged. Terms (or fractions of the values of terms) 
25 should be are merged when they are likely to be about the same topic. In the case of 

rapidly changing terms, it is unlikely that two similar-appearing but conceptually different 
terms will both have rapid rises at the same time. Thus, it is possible and desirable to 
merge similar-appearing terms that both have rapid rises, since they most probably relate 
to the same concept or topic. 

30 [00731 For example, the term "U.S. Open" might exhibit rising interest. If the term 
"U.S. Open Golf* is also exhibiting rising interest, but the term "U.S. Open Tennis" is 
not, the canonicalizer assumes that term "U.S. Open" and "U.S. Open Golf refer to the 
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same subject and can be combined but "U.S. Open" and "U.S. Open Tennis" should not 
be combined. Once the interest levels in "U.S. Open Golf or "U.S. Open" fall back to 
around their baseline, the canonicalizer would separate these terms eut-again, to havebin 
them binn e d separately. This would provide a desirable system response, at least for the 
5 above example, because depending on the timing of the U.S. Open sporting events, "U.S. 
Open" might relate to "U.S. Open Golf*, then fall back near its baseline and then rise 
along with "U.S. Open Tennis", at which point "U.S. Open" would be associated with the 
"U.S. Open Tennis" category. 

10074] Thus, the canonicalizer would respond to canonicalizations that change over 
10 time, as is often the case in the real world of user interests. When combined with other 
elements of a traffic monitor, the buzz values for terms that reflect actual user interests 
are readily available for use by the canonicalizer to determine which topics/terms to 
merge and when. 

15 3. Examples of Sources of Data for Traffic Monitor and Uses for Collected Data 

100751 F ig. 7 is a block diagram of server system 210 including traffic monitor 100 
according to one embodiment of the present invention. In server system 210, users 
connect to servers 214 by connecting user fclient) computers 212 to servers 214 
vi athrough a network 216. Li a specific implementation, user computers 212 are 

20 Intemet-connectable computers (desktop computers, laptop computers, palm-sized 

computers, wearable computers, set-top boxes, embedded TCP/IP clients, and the like), 
servers 214 are Internet-connected servers responsive to requests at a an URL designated 
by the portal operator and network 216 is the "Internet". The typical computer 212 
includes a browser or other HTTP client that is usod to provide provides a user with HTTP 

25 access to the Internet and the Web. 

100761 The particular details of how a particular user computer 212 connects to a 
particular server 214 and how the particular server 214 is selected are not shown here, as 
ther e exist many such arrangements _e>dst and the present invention is not limited to any 
particular client-server arrangement. In the figures, distinct instances of like objects are 
30 distinguished with parenthetical indices. For example, user computer 212 might refer to 
212(1) or 212(n). As used herein, "n" refers to an indeterminate integer where the actual 
value of the integer is not relevant and may depend on details not relevant here. It is used 
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in various contexts and the value of "n" may be different in each context, unless otherwise 
indicated. For example, the user computers in Fig. 1 are shown ranging from 212(1) to 
212(n) and the servers are shown ranging from 214(1) to 214(n). Thus, one can infer that 
there are an indeterminate number of user computers and servers and the actual number is 
5 not relevant for the purposes of this description, but one should not infer that the number 
of user computers and servers must be the same. 

[00771 F ig. 7 shows, in addition to user computers 212 and servers 214, several other 
components, such as storage for server logs 220, traffic monitor 100 with inputs for 
reading server logs 220 and outputs for count data to be added to a statistics database 224. 
10 Also shov r ii is a Web server 230 coupled to network 216 and a database server 3 ^232 
that is^ in tum^ coupled to statistics database 234^ 224 are also shown. 

[00781 In a typical operation, a user connects a user computer 212 to a server 214 and 
requests one or more pages, wfth-each page being identified by aan URL. Because of the 
user perception of this process, it is often described as a visitor visiting going to a 

1 5 particular page on a Web site as defined by a URL, to analoiiiize to physical mov e m e nts an 
URL. . However, the visitor does not actually move anywhere and there might not be a 
physical "site" that can be pointed to as the place that is visited. Nonetheless, such 
analogies have become quite common and are used herein. Thus, it should be understood 
that the act of a user or "visitor" going to a page on a site is normally an act of the user of 

20 visitor directing its computing device to make a request through a network that handl e s 
such requests, wherein th e requ e st is for a page specified by the URL of the request and 
maintained on a server specified in the URL or in the request, along with the ae tmethod 
of receiving a response from the server and possibly displaying it or processing it. 

[00791 In current use, even the term "page" is somewhat of an analogy to the 
25 beginnings of the World Wide Web, when the requests were for page files stored in 

directories on the server specified in the URL. However, in current use, "page" refers to 
what is returned by the server and thus a page might be data that is not even in existence 
at the time of the request (e.g., dynamic Web pages). 

[00801 One possible order of events wiWig now-be described with reference to Fig. 7. 
30 The events described below correspond to circled numbers in the figure^ which are 
parenthetically referenced in the text below. One of ordinary skill in the art will 
recognize, after reading this disclosure and revi e w o f reviewine; the figures, that other 
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orders of events are contemplated by this disclosure and many equivalents can be inferred 
from the figures and text. 

[00811 The events illustrated by the circled numbers begin with a process of logging 
page hits (1) occurring on servers 214. Many of the details of the logging process are 
5 described in further detail above. Once the server logs are created, traffic monitor 222 
can r e ad reads the logs to identify coxmts of hits by subject (2) and store those counts in 
statistics database 224 (3). The next event is wbefe vvhen a user issues a statistical query 
relating to buzz (4). As shown, the user issues the query using user computer 22(a), but it 
should be imderstood that any computer or computing device with sufficient rights and 

10 capabilities, including user computers 212(1) through 212(n), could be used for buzz 

queries. From whatever source, Web server 230 receives a the request for buzz statistics 
and translates the request into a database query, which is presented to database server 232 
(5). In response, database server 232 reads data from statistics database 224 (6) and 
retums a database result to Web server 230 (7). Web server 230 then formats the 

1 5 database result into a Web page and delivers that page to the requesting user computer 

(8). An example of such a delivered page is the pag e shown in Figure-SrS^ That example 
page is responsive to a request for top buzz values for overall events and events specific 
to the categories of movies, music and sports. 

1 0082 1 N ote that, depending on the device making the request, what is retiuned by Web 
20 server 230 might not be in the form of an HTML page, but would typically be in a form 
usable by the requesting device. For the purposes of providing at least one specific 
detailed example, this description assumes that user computers 212(l)-(n) are HTTP 
clients and request pages interactively from servers 214, and that user computer 212(1) is 
also an HTTP client and interacts with Web server 230 in a conventional manner. 
25 While Althou^ it should be imderstood that wheF ewhen many querying devices are in 
use, Web server 230, and possibly database server 232, might be replaced with arrays of 
servers to handle the load of statistical queries. 

[0083] In one embodiment of a traffic monitor that is described herein, the monitor 
operates off ef^usage logs generated by a Web site's servers. Notwithstanding that 
30 description, it should be understood that the monitor might operate off ef<tther 

indications of traffic, such as real-time page hits, click streams, purchase records or 
database records. Furthermore, whj4e al though the traffic monitor is shown as a unified 
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system, a distributed traffic monitor might be used where such distribution aids in making 
the traffic monitor scalable and less computationally complex, all without necessarily 
departing from the scope of the invention. It should also be understood that the present 
invention is not limited to a particular Web site or collection of Web sites, although many 
5 of the examples show examples from a specific Web site, namely the Yahoo! Web site. 

4. Uses of the Statistical Analysis 

100841 As described above, a "buzz" value represents the level of interest of a subject, 
such as a movie, a person, product, place, or event, cultural phenomena, etc^, and the 
10 change in buzz value might be indicative of a trend. The buzz value can be calculated as 
the number of unique users searching for that subject anywhere on a portal site or set of 
portal siteSi or viewing a page of content relevant to that subject anywhere on the portal 
site or set of portal sites. As described herein, buzz might also be calculated without 
regard to whether each event that is counted is originated by a unique user. 

15 100851 The buzz values can be used to identify cultural trends, track interest in specific 
brands, measure the effectiveness of marketing campaigns, etc. For buzz events that are 
purchase events, the count by which a bin is incremented might be a function of purchase 
amount, so that purchases of larger amounts have more of an effect on a product's bxxzz 
than purchases of smaller amounts. 

20 [00861 In one variation, the buzz value associated with a particular term or category is 
the number of users searching e nwith that term, or viewing a page related to that term, 
divided by a sum of users searching, where the sum can be the sum of users searching 
over all subcategories in a category, sum of users searching over all terms in a category, 
or the sum of all users searching anywhere on the site. The latter normalization is useful 

25 to factor out time-based increases in traffic, such as weekday- weekend patterns, seasonal 
patterns and the like. A normalization factor might be applied to all terms being 
compared so that the buzz values are easily represented. For example, if there are four 
terms in a category, 100 total unique user hits on those four terms (25, 30, 40 and 5, 
respectively) out of one million total unique users, a normalization factor of 100,000 

30 might be applied so that the buzz values are 2.5, 3, 4 and 0.5, instead of 0.000025, 

0.00003, 0.00004 and 0.000005. Normalization can also be used when determining the 



21 



buzz surrounding one company or product against an index of other companies or 
products within a particular market segment or product category. 

[00871 In some cases, the buzz values for a subject might be a leading indicator for 
electronic commerce transactions relevant to that subject. For example, the buzz for the 
5 term "widget" might rise and be followed by increased on-line purchases of widgets. 

Such information is useful to advertisers interested in having their bran dbrands of widgets 
be selected, as well as fulfillment managers eager to have in stock the latest trendy items. 

[0088] B uzz values can be presented fi^om overall data or it can be isolated to specified 
demographic groups. Thus, with enough traffic, the traffic monitor can track the top buzz 
10 among women aged 33-45, the top buzz for "newbies" (people who are new to the online 
world), buzz by countrvr or by regions of countries. In addition to just a buzz number, the 
system might also provide a commerce index to show how different vertical markets or 
products are growing/shrinking over time. 

[0089] While advertisers and other businesses might find the buzz values to be useful 
15 aftd^ key marketing feedback data and thus be willing to pay for the data, other buzz 

values might be made available to consumers or to,the public in general public for fi"ee or 
gj a nominal cost. For example, the Web site operator might opt to provide general access 
to the buzz relating to current movies and rock stars while providing more restrictive 
access to data relating to a particular marketing campaign being tracked by the operator 
20 for the company that launched the campaign. 

4. a. Buzz/Trend Reports 

[0090] Fig. 8 is a flowchart of one process for generating buzz/trend reports. 

[0091] One example of a buzz report is shown in Fig. 9. That report has a section for 
25 buzz values (normalized fi-om the counts) for overall terms as well as sections for the 
categories of movies, music and sports. For each section of the report, the report shows 
the top few topics/terms that generated the most counts, in order of number of counts, 
along with an indication of relative change in buzz values. When implemented as a 
hyperlinked page, the report also includes links to a list of categories (the link is denoted 
30 by "A" in Fig. 9 and the "linked-to" page is illustrated in Fig. 10), icons to change the sort 
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order, as well as related links related to the particular topic/term (e.g., news, categories 
and sites relating to the topic/term). 

f00921 As shown in Fig. 10, white although the counts might be separated in a bin 
record by category (see Fig. 3(b)), the counts can also be aggregated over all categories. 

5 100931 Another buzz report is shown in Fig. 1 1, where the buzz for terms is plotted over 
time and relative to other terms in a category. 

100941 Fig. 12 illustrates yet another buzz report, showing buzz values for 
subcategories in a category. As shown in Fig. 12(a) the subcategories are for the category 
"Music" and are sorted by percentage change in buzz value. Fig. 12(a) and 12(b) together 
10 form the full report. In th e portion of th e report shown in Fig. I2(h)^ shows the buzz for 
terms over the category "Music" ar e ther e shown . A link to a customization page is 
provided ("Preferences") as well as a link to a user-specific buzz index ("My Buzz 
Index"). 

[00951 In general, there are many ways to present the data generated by the traffic 
15 monitor. Buzz values can be "sliced" by demographic demoRraphics to illuminate 

demographic information about the users searching for a particular search term. Buzz 
values might be sliced by method of access, such as wireless or broadband access. Buzz 
values can be presented in various sort orders such as "buzz score" or by the "% change 
in buzz" for th e time periodg specified . Users of th e buzz reports period. Buzz report 
20 users can easily determine, for seme a particular demographic or overall, what topics or 
search terms get more attention and where the spikes in attention occur over time. 

[00961 In one application, a buzz report generator generates buzz reports on the fly 
based eneon requests fi-om user s of the buzz report generato r jisers . Thus, such users can 
request and receive customized views of buzz by segments. A buzz report generated by 
25 the buzz report generator can be presented for any type of user segments that can be 
defined by user characteristics^ such as demographics, lifestyles, interests and/or 
geographic location. 

[00971 D emographics of users eanmav also be used as added data rather than just as a 
way to slice the data. For example, a demographic report might indicate that of all the 
30 registered users causing events for a given term (i.e., searching using that term), X% are 
women, Y% are within the ages of 18-25, etc. 
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4.b. Selling Advertising Space Based on Categorizations and/or Buzz 

1 0098 1 Categorizing search words has many applications, such as selling advertising 
space on search page-results page for searches on a large number of words. This would 
5 allow a car manufacturer to specify that tti e iri ts advertisement be shown whenever a 
search phrase is categorized in a car category. For example, if a visitor searches for 
"Dodge" and previous user behavior (over possibly many users) had indicated that 
"Dodge" can be categorized in an automobile category, the advertisement would be 
shown. 

10 [00991 A nother use of buzz in relation to advertisements is an application that generates 
the text and/or other creative components of the advertisement and does so as a function 
of the top buzz subjects or products for a category of interest to a visitor to a Web site. 
For example, if a visitor to a site demonstrates interest in "rap music", the application 
would generate an advertisement that took into account the top buzz for a rap band, such 

15 as generating an advertisement that highlighted the offerings of that top rap band. 

4.C. Campaign Monitoring 

[01 001 Campaign tracking allows users to measure the impact of their marketing 
campaigns on generating online buzz. Fig. 1 3 illustrates a basic campaign monitoring 
20 page. Pre-campaign buzz can be compared with buzz during and after the campaign, as 
shown in Fig. 14. 

4.d. Intersection Analysis 

[01011 F ig. 15 illustrates intersection analysis. Intersection analyses of the 
25 demographics of users searching for two terms allows users to identify any overlaps 
between groups of users searching for multiple terms or brands (e.g.. Ford and GM, or 
Britney Spears and Christina Aguilera). 

4.e. Associated Interests Analysis 
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f01021 Fig. 16 illustrates associated interests analysis. Associated interests analysis 
indicates the other interests of users searching for a particular term. For example, of 
those people searching for Ford, the other terms/categories they are searching for can be 
tracked. 

5. Variations on the Basic System 

[0103] The above description is intended as a thorough teaching of how to make and 
use a statistics monitor and several exemplary variations. The above description is not 



intended to be exhaustive of the possibilities. For example, the above description 
10 generally assumes that the interconnecting media between the users and the monitored 
site is the Internet, but the Internet can be replaced with other media without departing 
from the scope of the invention, such as a non-TCP/IP network, a Local local area network 
(LAN), and intranet, a virtual private network (VPN), or a wireless-access protocol 
. (WAP) network. While the above systems may have been explained with reference to a 
15 particular criteria for counting, such as only one count per unique user per day, other 
criteria might be used, such as incrementing once every time a user causes an event, or 
once per user per day. 

[0104] The above description should not be construed to be limiting to particular 
computing devices, as the statistics monitor might monitor visits by users with WAP 
20 devices, handheld computers, embedded computers, laptops lantog computers and 

Web-enabled devices, to name a few. In a practical system, the monitor might handle 
multiple types of devices and might even track statistics by device type or track different 
device types differently. 

tOlOS] The pages being viewed need not be HTML, but might be dynamic server pages, 
25 ASP pages, for example. Ate eMoreover . the "buzz" is not the only statistic that can be 
tracked. For example, some other variable can be tracked. In a particular example, 
results from the statistics monitor might be used to calculate a charge to the user where 
the page views are not free but are related in some way to the statistical results. 

1 01061 The above description is illustrative and not restrictive. Many variations of the 
30 invention will become apparent to those of skill in the art upon review of this disclosure. 
The scope of the invention should, therefore, be determined not with reference to the 
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above description, but instead should be determined with reference to the appended 
claims along with their full scope of equivalents. 
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